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ABSTRACT 


A  numerical  algorithm  is  given  for  implementing  a  nonparametric  maxi¬ 
mum  penalized  likelihood  estimator  similar  to  those  proposed  by  Good  and 
Gaskins  and  those  proposed  by  de  Montricher,  Tapia  and  Thompson.  It  is 
shown  how  the  resulting  nonlinear  constrained  optimization  problem  may  be 
effectively  solved  by  using  Tapia’s  approach  to  Newton's  method  for  con¬ 
strained  problems. 


1.  Introduction,  de  Montricher,  Tapia  and 
Thompson  demonstrated  that  the  standard 
histogram  was  an  unstable  maximum  likelihood 
density  estimator  and  considered  maximum 
penalized  likelihood  estimators  similar  to 
those  previously  considered  by  Good  and 
Gaskins  (1971).  Specifically  suppose  we  are 
given  the  random  sample  x. , . . . ,x^  €  (a, b)  . 
Let  Ho(s,b)  consist  of  tne  functions  f 
defined  on  (a,b)  with  the  property  that 
f(a)  «  f(b)  =0  and  f'  is  a  member  of 
L2(a,b).  Estimate  the  density  function  which 
gave  ri’e  to  the  random  sample  x,,...,x^ 
by  the  solution  of  the  constrained  optimiza¬ 
tion  problem 

(1.1)  max  L(f )  ; f  € H^(a,b)  ,  f>0  and 
.b 

J  f(x)dx  =  l, 

“  a 


1. e.,  a  polynomial  of  degree  two  plus  a  spline 
of  degree  one.  We  now  give  a  numerical  algo¬ 
rithm  for  approximating  this  monospline. 

2.  The  Discrete  Problem.  For  given  n,  con¬ 
sider  the  mesh  t0,...,tItl_^  where  ti  =  a+ih, 

J  =  o,...,n+l  with  h  =  (b-a)/fc»  +  1)  .  Let 

Hi  denote  the  vector  space  of  all  continuous 
piecewise  linear  functions  which  have  knots  at 


and  vanish  at 


and  b.  For 


p€Hl  let  yi  =  p(ti),  i  =  0, . . . ,n+  1. 
Yo  =  yiri-1  =  0  and 

(2.1)  p(x)>0»yi>0,  i»l,...,n 

(2*2)  J  p(x)dx  =  h  £  y . 

8  1=0 

(2-3)  J*b  p'002dx  =  r  £  (y1+i  -  y.)2  . 


(1.2)  L(f)=nif(xi)exp(-0fJJf-(x)|  dx) ,  (or  >0).  ^  ^  in  ^h, 

The  functional  L  in  (1.2)  is  called  the  (2.5)  v .  =#of  x.  in  [t,  ,  +  7,  tr-t-l), 

penalized  likelihood  and  the  solution  of  1  i 

(1.1)  is  colled  the  maximum  penalized  llkell-  i“2,...,n-l 

hpod  estimator  based  on  the  random  sample  (2.6)  vn  =  #of  x,  in  (t  -  +  r,,b). 

S.1 1  ■ . .  ,x„  .  de  Montricher,  Tapia  and  Thompson  n.  2 

(1373)proved  that  problem  (1.1)  has  a  unique  Wa  sha11  as9u”e  that  we  have  anou8h  data  90 
solution  and  is  a  monosollne  of  degree  two,  that  '>1>0Vi  .  Our  finite  dimensional 
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approximation  to  problem  (1-0  is 

(2.7)  max  L<y);y1=:0Vi  and  =  h'1 

Whcrc  n  ^  n  , 

(2.8)  L(y)  -  yt  exp(-“h  s  <yi+ryi>  >• 

1»1  i“l 

Clearly  (2.7)  i®  a  constrained  optimization 
problem  In  Rn  • 

Pro[s>3 trion  2.1.  The  constraints  y  >0  of 
problem  (2.7)  are  not  active  at  thesolution. 

Proof.  If  y*=(hN)  1(1,...,1),  then  y* 
satisfies  all  the  constraints  of  problem 
(2.7)  and  L(y)>0.  Moreover,  if 

y  *  (y! . yn)  is  such  that  yt  =  0  for 

some  i,  then  L(y)  =  0.  This  proves  the 

proposition. 

It  follows  that  we  can  obtain  the  solu¬ 
tion  of  problem  (2.7)  by  solving 

(2.9)  min(-log  L(y));  £  y,  “h"1 

i=l 

where  from  (2.8)  we  see  that 

(2.10)  -log(L(y))  =  -  I  v.  log(y  ) 

1=1  1 

+  oh"1  £  (y  -  y.)2  . 

1=1 

3.  The  Algorithm.  The  Lagrangian  for  problem 
(2.9)  is 

(3.1)  i(y,X)  =  -  8  v  log(yi) 

i=l  1 

-w*1"1  2  (y1+1  -  y,)2 

i=0  1 

n  -i 

+X(  £y.-h‘)  . 

1=1 

The  gradient  of  the  Lagrangian  Is 

(3.2)  Vl(y,\)  =  (...,-Vj^y"1 

+  2<yh"1(-yi+1+2y1-yi_1) 

+  X,...)T 

and  the  Hessian  of  the  Lagrangian  is  the 
diagonally  dominant  tridiagonal  matrix 

h  ■>.  \ 


(3.3)  V2(.<y,X)- 


d  d2 
-I  o 


\\\ 


.  .1  / 

where  *  dj  *  -2 oh  and 

do"4oh*1+  VyjV  • 

It  therefore  follows  thet  Tapia's  (1974), 
(1976)  approach  to  Newton's  method  for  con¬ 
strained  problems  is  s  natural  one  for  this 
problem  and  the  operation  count  will  be 


0(n)  per  iteration  instead  of  the  usual 
O(n')  expected  from  Newton's  method, 
n  -) 

Let  g(y)  =  £  y .  -  h  .  Then 
1=1 

U  “  '7g(y)  “  (1, . . .  ,1)T  .  We  use  (,)  to  de¬ 
note  the  inner  product  in  Rn  . 

The  Newton-like  Algorithm. 

Step  1.  Determine  or>0,  *>0,  y°,X°  and  set 
k;=0  . 

Step  2.  Calculate 

XkH  -  <U,V2L(yk,Xk)"1U>"?g(yk)  -  <U,'72L(yk.xS"1U>) 
and 

yk+1  =  yk  -  V2t(yk,Xk)  "1VL(yk,Xkfl) 

Step  3.  If  ll'7(,(xk,  X*1)  II  <  e,  then  stop. 

If  not,  then  set  k:  =k+l  and  go  to 
Step  2. 

Initialization  values  could  be 
_ -4 


y°  *  W^i . i) 

and  ,  n  , 

X  =  -2(nh)  (yi+y^d-  2  v^ny^  . 

i=*l 

This  X°  is  given  by  the  projection  formula 
in  Tapia  (1976). 

For  a  complete  description  of  this  algo¬ 
rithm  and  related  quasi-Newton  methods  for 
constrained  optimization  the  reader  is  referred 
to  Tapia  (1976) . 

Proposition  3.1.  The  above  algorithm  is  locally 
quadratically  convergent  and  requires  only 
0(n)  operations  per  iteration. 

4,  Some  Numerical  Examples.  Although  for 
reasons  of  conciseness  it  was  appropriate  to 
develop  the  above  discrete  maximum  penalized 
likelihood  algorithm  using  the  integral  of 
the  square  of  the  first  derivative  in  the 
penalty  team,  we  have  found  by  experience 
that  the  slightly  more  complicated  second  de¬ 
rivative  approach  gives  less  locally  "rough" 
estimators.  Namely  we  consider  the  problem 

(4.1)  Max  L(f);f  €H2(a,b),  f^O  and 

/  f (xldx  -  1 

"a 

where  b 

(4.2)  L(f)  *  ft  f(x  )exp[-af  |f"(x)  l2dx),(0(>0). 

i-1  « 

The  details  of  the  algorithm  to  approximate 
the  solution  to  this  problem  are  omitted, 
since  they  are  very  similar  to  the  argument 
In  Section  2.  The  operation  count  is  still 
0(n)  per  iteration. 

In  Figure  1,  we  demonstrate  the  solution 
to  (4,1)  using  a  random  sample  of  size  20  frwm 
the  standard  normal  distribution  with  mean  0 
and  variance  1.  In  Figure  2,  we  show  the 


estimator  based  on  a  sample  of  size  100.  In 
Figures  3  and  4  we  show  the  D.M.P.L.E. esti¬ 
mators  for  the  50-50  mixture  of  two  normal 
distributions,  both  having  variance  1  and 
with  means  at  -1.5  and  +1.5  for  samples  of 
size  25  and  100  respectively. 


f 


i 

I 


One  comforting  feature  of  the  maximum 
penalized  likelihood  procedure  is  the  rela- 
tivdy  robust  quality  of  the  estimator  in  that 
changes  of  the  optimal  Of  with  N  and  from 
distribution  to  distribution  tend  not  to  be 
traumatic,  and  that  a  rough  and  ready  guess 
for  a  (e.g.,  10)  is  frequently  satisfactory. 
In  Figures  5  and  6  we  show  an  estimate  for 
the  Gaussian  mexture  mentioned  above  for  a 
sample  size  of  300  and  a  values  of  10 
and  .1. 

If  the  density  to  be  estimated  is  de¬ 
noted  by  j:(.)  and  the  D.M.P.L.E.  is  de¬ 
noted  by  f(-),  then  we  consider  as  one 
measure  of  estimate  quality  the  average  inte¬ 
grated  mean  square  error 

(4.3)  IMSE=J(f(x)  -  f(x))2f(x)dx  .  • 

I.M.S.E.'s  for  various  Of  and  N  are  given 
in  Table  1  for  the  standard  normal,  the 
50-50  normal  mixture  mentioned  above,  the 
t  distribution  with  5  degrees  of  freedom 
and  the  F  distribution  with  (10,10)  de¬ 
grees  of  freedom. 
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TABIE  1 

Average  I.M.S.E.  of  the  D.M.P.L.E.  for 
Of  Perturbed  by  a  Factor  of  Two.  Divide 
Of  by  10  for  the  F1Q  Samples. 

I.M.S.E. 

for 


Sample 

o=5 

Of  -  10 

0  =  20 

N(0.1)  N  =25 

.00242 

.00267 

.00427 

N(0.1)  N  = 100 

.00093 

.00079 

.00089 

N(0, 1)  N  =  400 

.00037 

.00033 

.00035 

N(0.1)  N  =  800 

.00028 

.00022 

.00019 

Bimodal  N  =  25 

.00197 

.00159 

.00152 

Bimodal  N  =  100 

.00070 

.00054 

.00171 

Bimodal  N  =  400 

.00030 

.00024 

.00022 

tj  N  =  25 

.00297 

.00282 

.00350 

tj  N  =  100 

.00092 

.00084 

.00101 

t5  N  =  400 

.00039 

.00032 

.00030 

F10,10  N  =  25 

.03208 

.03865 

.05519 

F10,10N  =  10° 

.00996 

.01390 

.02411 

F10ii0N=400 

.00292 

.00450 

.00740 
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